NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Approximate Equivariance in Reinforcement Learning

Park, Jung Yeon; Bhatt, Sujay; Zeng, Sihan; Wong, Lawson LS; Koppel, Alec; Ganesh, Sumitra; Walters, Robin (May 2025, Artificial Intelligence and Statistics (AISTATS))

Free, publicly-accessible full text available May 6, 2026
Approximate Equivariance in Reinforcement Learning

Park, Jung Yeon; Bhatt, Sujay; Zeng, Sihan; Wong, Lawson LS; Koppel, Alec; Ganesh, Sumitra; Walters, Robin (January 2025, International Conference on Artificial Intelligence and Statistics)

Full Text Available
Controlled Sequential Information Fusion with Social Sensors

https://doi.org/10.1109/TAC.2020.3046024

Bhatt, Sujay; Krishnamurthy, Vikram (January 2021, IEEE Transactions on Automatic Control)
null (Ed.)
Full Text Available
Policy Gradient using Weak Derivatives for Reinforcement Learning

https://doi.org/10.1109/CDC40024.2019.9029403

Bhatt, Sujay; Koppel, Alec; Krishnamurthy, Vikram (December 2019, 2019 IEEE 58th Conference on Decision and Control)

This paper considers policy search in continuous state-action reinforcement learning problems. Typically, one computes search directions using a classic expression for the policy gradient called the Policy Gradient Theorem, which decomposes the gradient of the value function into two factors: the score function and the Q-function. This paper presents four results: (i) an alternative policy gradient theorem using weak (measure-valued) derivatives instead of score-function is established; (ii) the stochastic gradient estimates thus derived are shown to be unbiased and to yield algorithms that converge almost surely to stationary points of the non-convex value function of the reinforcement learning problem; (iii) the sample complexity of the algorithm is derived and is shown to be O(1/ k); (iv) finally, the expected variance of the gradient estimates obtained using weak derivatives is shown to be lower than those obtained using the popular score-function approach. Experiments on OpenAI gym pendulum environment illustrate the superior performance of the proposed algorithm.
more » « less
Full Text Available
Adaptive Polling in Hierarchical Social Networks Using Blackwell Dominance

https://doi.org/10.1109/TSIPN.2019.2918442

Bhatt, Sujay; Krishnamurthy, Vikram (September 2019, IEEE Transactions on Signal and Information Processing over Networks)

Full Text Available
Multiple stopping time POMDPs: Structural results & application in interactive advertising on social media

https://doi.org/10.1016/j.automatica.2018.06.013

Krishnamurthy, Vikram; Aprem, Anup; Bhatt, Sujay (September 2018, Automatica)

Full Text Available
Incentivized Information Fusion with Social Sensors

https://doi.org/10.1145/3199524.3199539

Bhatt, Sujay; Krishnamurthy, Vikram (March 2018, ACM SIGMETRICS Performance Evaluation Review)

Full Text Available
Controlled information fusion with risk-averse CVaR social sensors

https://doi.org/10.1109/CDC.2017.8264037

Bhatt, Sujay; Krishnamurthy, Vikram (December 2017, 56th IEEE Conference on Decision and Control)

Full Text Available

Search for: All records